NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Evaluating Tuning Opportunities of the LLVM/OpenMP Runtime

https://doi.org/10.1109/SCW63240.2024.00131

Chheda, Smeet; Verma, Gaurav; Tian, Shilei; Chapman, Barbara; Doerfert, Johannes (November 2024, IEEE)

Full Text Available
OpenMP Kernel Language Extensions for Performance Portable GPU Codes

Tian, Shilei; Scogland, Tom; Chapman, Barbara; Doerfert, Johannes (November 2023, Association for Computing Machinery)
Badia, Rosa M; Mohror, Kathryn (Ed.)
In contemporary high-performance computing architectures, the integration of GPU accelerators has become increasingly prevalent. To harness the full potential of these accelerators, developers often resort to vendor-specific kernel languages, such as CUDA. While this approach ensures optimal efficiency, it inherently compromises portability and engenders vendor dependency. Existing portable programming models, such as OpenMP, while promising, demand extensive code rewriting due to their foundamental difference from kernel languages. In this work, we introduce extensions to LLVM OpenMP, transforming it into a versatile and performance portable kernel language for GPU programming. These extensions allow for the seamless porting of programs from kernel languages to high-performance OpenMP GPU programs with minimal modifications. To evaluate our extension, we implemented a proof-of-concept prototype that contains a subset of extensions we proposed. We ported six established CUDA proxy and benchmark applications and evaluated their performance on both AMD and NVIDIA platforms. By comparing with native versions (HIP and CUDA), our results show that OpenMP, augmented with our extensions, can not only match but also in some cases exceed the performance of kernel languages, thereby offering performance portability with minimal effort from application developers.
more » « less
Full Text Available
Implementing OpenMP’s SIMD Directive in LLVM’s GPU Runtime

https://doi.org/10.1145/3605573.3605640

Wright, Eric; Doerfert, Johannes; Tian, Shilei; Chapman, Barbara; Chandrasekaran, Sunita (August 2023, ACM)

Full Text Available
High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

https://doi.org/10.1145/3572848.3577475

Moses, William S; Ivanov, Ivan R; Domke, Jens; Endo, Toshio; Doerfert, Johannes; Zinenko, Oleksandr (February 2023, ACM)
NA (Ed.)
While parallelism remains the main source of performance,architectural implementations and programming modelschange with each new hardware generation, often leadingto costly application re-engineering. Most tools for perfor-mance portability require manual and costly application port-ing to yet another programming model.We propose an alternative approach that automaticallytranslates programs written in one programming model(CUDA), into another (CPU threads) based on Polygeist/MLIR.Our approach includes a representation of parallel constructsthat allows conventional compiler transformations to ap-ply transparently and without modification a nd enablesparallelism-specific optimizations. We evaluate our frame-work by transpiling and optimizing the CUDA Rodinia bench-mark suite for a multi-core CPU and achieve a 58% geomeanspeedup over handwritten OpenMP code. Further, we showhow CUDA kernels from PyTorch can efficiently run andscale on the CPU-only Supercomputer Fugaku without userintervention. Our PyTorch compatibility layer making use oftranspiled CUDA PyTorch kernels outperforms the PyTorchCPU native backend by 2.7×.
more » « less
Full Text Available
SPLENDID: Supporting Parallel LLVM-IR Enhanced Natural Decompilation for Interactive Development

https://doi.org/10.1145/3582016.3582058

Tan, Zujun; Chon, Yebin; Kruse, Michael; Doerfert, Johannes; Xu, Ziyang; Homerding, Brian; Campanoni, Simone; August, David I. (March 2023, International Conference on Architectural Support for Programming Languages and Operating Systems)

Manually writing parallel programs is difficult and error-prone. Automatic parallelization could address this issue, but profitability can be limited by not having facts known only to the programmer. A parallelizing compiler that collaborates with the programmer can increase the coverage and performance of parallelization while reducing the errors and overhead associated with manual parallelization. Unlike collaboration involving analysis tools that report program properties or make parallelization suggestions to the programmer, decompiler-based collaboration could leverage the strength of existing parallelizing compilers to provide programmers with a natural compiler-parallelized starting point for further parallelization or refinement. Despite this potential, existing decompilers fail to do this because they do not generate portable parallel source code compatible with any compiler of the source language. This paper presents SPLENDID, an LLVM-IR to C/OpenMP decompiler that enables collaborative parallelization by producing standard parallel OpenMP code. Using published manual parallelization of the PolyBench benchmark suite as a reference, SPLENDID's collaborative approach produces programs twice as fast as either Polly-based automatic parallelization or manual parallelization alone. SPLENDID's portable parallel code is also more natural than that from existing decompilers, obtaining a 39x higher average BLEU score.
more » « less
Full Text Available
Scalable Automatic Differentiation of Multiple Parallel Paradigms through Compiler Augmentation

https://doi.org/10.1109/SC41404.2022.00065

Moses, William S.; Narayanan, Sri Hari; Paehler, Ludger; Churavy, Valentin; Schanen, Michel; Hückelheim, Jan; Doerfert, Johannes; Hovland, Paul (November 2022, IEEE)

Full Text Available
Reverse-mode automatic differentiation and optimization of GPU kernels via enzyme

https://doi.org/10.1145/3458817.3476165

Moses, William S.; Churavy, Valentin; Paehler, Ludger; Hückelheim, Jan; Narayanan, Sri Hari; Schanen, Michel; Doerfert, Johannes (November 2021, SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis)

Computing derivatives is key to many algorithms in scientific computing and machine learning such as optimization, uncertainty quantification, and stability analysis. Enzyme is a LLVM compiler plugin that performs reverse-mode automatic differentiation (AD) and thus generates high performance gradients of programs in languages including C/C++, Fortran, Julia, and Rust. Prior to this work, Enzyme and other AD tools were not capable of generating gradients of GPU kernels. Our paper presents a combination of novel techniques that make Enzyme the first fully automatic reversemode AD tool to generate gradients of GPU kernels. Since unlike other tools Enzyme performs automatic differentiation within a general-purpose compiler, we are able to introduce several novel GPU and AD-specific optimizations. To show the generality and efficiency of our approach, we compute gradients of five GPU-based HPC applications, executed on NVIDIA and AMD GPUs. All benchmarks run within an order of magnitude of the original program's execution time. Without GPU and AD-specific optimizations, gradients of GPU kernels either fail to run from a lack of resources or have infeasible overhead. Finally, we demonstrate that increasing the problem size by either increasing the number of threads or increasing the work per thread, does not substantially impact the overhead from differentiation.
more » « less
Full Text Available
OpenMP application experiences: Porting to accelerated nodes

https://doi.org/10.1016/j.parco.2021.102856

Bak, Seonmyeong; Bertoni, Colleen; Boehm, Swen; Budiardja, Reuben; Chapman, Barbara M; Doerfert, Johannes; Eisenbach, Markus; Finkel, Hal; Hernandez, Oscar; Huber, Joseph; et al (March 2022, Parallel Computing)

Full Text Available

Search for: All records